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Zipf's law is the most common statistical distribution displaying scaling behavior. Cities, popu- 
lations or firms are just examples of this seemingly universal law. Although many different models 
have been proposed, no general theoretical explanation has been shown to exist for its universality. 
Here we show that Zipf's law is, in fact, an inevitable outcome of a very general class of stochastic 
systems. Borrowing concepts from Algorithmic Information Theory, our derivation is based on the 
properties of the symbolic sequence obtained through successive observations over a system with an 
ubounded number of possible states. Specifically, we assume that the complexity of the description 
of the system provided by the sequence of observations is the one expected for a system evolving to a 
stable state between order and disorder. This result is obtained from a small set of mild, physically 
relevant assumptions. The general nature of our derivation and its model-free basis would explain 
the ubiquity of such a law in real systems. 



I. INTRODUCTION 

Scaling laws are common in both natural and artificial 
systems [l|. Their ubiquity and universality is one of 
the fundamental issues in statistical physics [2-4]. One 
of the most prominent examples of power law behavior 
is the so called Zipf's law ^ was popularized by 

the linguist G. K. Zipf, who observed that it accounts for 
the frequency of words within written texts [5j, [8j . But 
this law is extremely common, }9| and has been found 
in the distribution of populations in city sizes 

SEMI, 

firm sizes in industrial countries [15| , market fluctuations 
[III, money income [l7|, EH, Internet file sizes [l!| or 
family names [2(| • For instance, if we rank all the cities 
in a country from the largest (in population size) to the 
smallest, Zipf's law states that the probability p(si) that 
a given individual lives in the i-th most populated city 
(i = 1, ...,n) falls off as 

P(*i) = \>T\ (1) 

with the exponent, 7 ~ 1 , and being Z the normalization 
constant, i.e., 

Z=\J2 r \ ( 2 ) 

\i<n J 

Although systems exhibiting Zipf's-likc statistics are 
clearly different in their constituent units, the nature of 
their interactions and intrinsic structure, most of them 
share a few essential commonalities. One is that they 
are stochastic, far from equilibrium systems changing in 
time, under mechanisms that prevent them to become 
homogeneous. Within the context of economic change, 
for example, wider varieties of goods and attraction for 
people are fueled by large developed areas. Increasing 
returns drive further growth and feedback between econ- 
omy and city sizes [2lH23j . Moreover, the presence of 



a scaling law seems fairly robust through time: in spite 
of widespread political and social changes, the statisti- 
cal behavior of words in written texts, cities or firms 
has remained the same over decades or even centuries 
[E 0, EE, HH, [HJ . Such robustness is remarkable, given 
that it indicates a large insensitivity to multiple sources 
of external perturbation. In spite of their disparate na- 
ture, all seem to rapidly achieve the Zipf's law regime 
and remain there. 

To account for the emergence and robustness of 
Zipf's law, several mechanisms have been proposed, in- 
cluding auto-catalytic processes [2514271 . extinction dy- 
namics [H, [llj], intermittency [30l l3l|. coherent noise 
[3^ . coagulation- fragmentation processes [H, [34|, self- 
organized criticality [35j| . communicative conflicts [3H 
l37| . random typewriting [38L ErJ ] , multiplicative dynam- 
ics [H, Ho| or stochastic processes in systems with inter- 
acting units with complex internal structure |4lj . The 
diverse character of such mechanisms sharing a common 
scaling exponent strongly points towards the hypothesis 
that some fundamental property (beyond a given specific 
dynamical mechanism) is at work. Such a universal trend 
asks for a generic explanation, which should avoid the use 
of a particular set of rules. 

We address the problem from a very general, 
mechanism-free viewpoint; by studying the statistical 
properties of the sequence of successive observations over 
the system. More precisely, our observations can be un- 
derstood as a sequence of symbols of a given alphabet 
(depending on the nature of the system) following some 
probability distribution. The elements of this alphabet 
can be coded in some way -for example, bits. From this 
conceptual starting point, we borrow concepts from algo- 
rithmic information theory (AIT) and propose a charac- 
terization of a wide family of stochastic systems, to which 
those systems displaying Zipf's law would belong. Such 
a characterization imposes special features on the behav- 
ior of the entropy, whose study leads us to conclude that, 
under generic mathematical assumptions, Zipf's law is 
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the only solution. 

The paper is organized as follows: In section II we 
briefly introduce the concept of stochastic object as de- 
fined within the context of AIT and how it helps to under- 
stand our problem. In section III we find the asymptotic 
solutions of the equations derived from the characteri- 
zation provided in section II. Section IV discusses the 
relevance of the obtained results. 



II. ALGORITHMIC COMPLEXITY OF 
STOCHASTIC SYSTEMS 

The cornerstone of our argument is an abstract charac- 
terization of the sequence of observations made on a given 
system in terms of AIT JH-IH, H3, El -see also 0. The 
key quantity of such theory is the so-called Kolmogorov 
complexity, which is a conceptual precursor of statistical 
entropy, and an indicator of the complexity (and pre- 
dictability) of a dynamical system |50l - [52| . In a nutshell, 
let x be a symbolic string generated by the successive 
observations of the system S. Its Kolmogorov Complex- 
ity, K(x) is defined as the length l(TT*)-m bits- of the 
shortest program 7r* executed in a universal computer in 
order to reproduce x. This measure has been often used 
53 -[55 1 particularly in the context 
51]. In this context, K is known to 



satisfies 



in statistical physics 
of symbolic dynamics 
be maximal for completely disordered systems, whereas 
it takes intermediate values when some asymmetry on 
the probabilities of appearance of symbols emerges. 

Within the framework of statistical physics, a sequence 
of observations performed over a given system can be in- 
terpreted as a sequence of independent, identically dis- 
tributed random variables, where the specific outcomes of 
the observations are obtained according to a given prob- 
ability distribution. In mathematical terms, such a se- 
quence of observations defines a stochastic object. By 
definition, the Kolmogorov Complexity of a stochastic 
object, described by a binary string x = xi,..,x m of 
length to, satisfies [561 ]: 



lim = /1 



G (0,1] 



(3) 



In other words, the binary representation of a stochastic 
object is linearly compressible. The case where \x = 1 
refers to a completely random object, and the string is 
called incompressible. 

We can generalize the concept for non binary strings, 
whose elements belong to a given set E = {si, .., s„}, be- 
ing |E| = n. This is the case of a dice, for example, whose 
set of outcomes is E^ce = {1, 2, 3, 4, 5, 6}. If the behavior 
of the system is governed by the random variable X (n), 
accordingly, the successive observations of our stochas- 
tic system define a sequence of independent, identically 
p„-distributcd random variables Xi(n), ...,X m (n) taking 
values over the set E. The so-called noiseless Coding the- 
orem [U H3, [H| , establishes that the minimum length, 
(in bits) of the string needed to code the event Si, l*(si), 



l*( Si ) = -log(p n ( Si )) + 0(l). 



(4) 



(Throughout the paper, log = log 2 , unless the contrary 
is indicated). The average minimum length will corre- 
spond to the minimum length of the code, which is, by 
definition, the Kolmogorov complexity. Thus we obtain 



lim 

m— too 



K{X 1 {n),...,X m {n)) 



Pn(Si)l*(Si) 



ff(X(n))+0(l),(5) 



being H(X(n)) the Shannon or statistical entropy [47], 
H3, HH, namely: 



H{X{n)) = 



i<n 



p n (Si)logp n (Si). 



The complete random case is obtained when, Vs^ £ 
E p n (si) = 1/n leading to l*(si) = logn + 0(1). This 
indicates that we need « log n bits to code any element 
from E. Therefore, the length in bits of the sequence of to 
successive observations will be approximately to • logn. 
the average minimum length of the code will be lower 
than logn. Using our previous result ([3]) for the binary 
case, it is not difficult to see that: 

hm = fi; /i 6(0,1. (6) 

m-Kx> to • log n 

By defining h(n) as the normalized entropy as: 
_ H(X(n)) 



h(n) 



log n 



(7) 



and from cq. ([5]), we observe that eq. ^ can be rewritten 
as h(n) re fi; fi 6 (0, 1]. 

So far we have been concerned with the algorithmic 
characterization of stochastic systems for which the size 
of the configuration space is static. However, we must 
differentiate the properties of the systems we want to 
characterize from a standard stochastic object such as the 
ones obtained by tossing a dice or a coin. They both gen- 
erate a bounded number of possible outcomes -namely, 6 
and 2- with an associated probability, whereas those sys- 
tems exhibiting power-laws lack an a priori constraint on 
the potential number of available outcomes. These sys- 
tems are open concerning the size -or dimensionality- of 
the configuration space. Let X(n) be a random variable 
taking values on E, where |S| = n and with associated 
probability distribution p n , where (without any loss of 
generality) an ordering 



Pn(si) > p n {s 2 ) > ... > 



(8) 



is assumed. At a given time, the system satisfies eq. ([B]) , 
since it is a stochastic object with a given number of avail- 
able states. However, we assume that the system changes 
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(generally growin g) m aintaining its basic statistical prop- 
erties stable flM Jl5l. l24j | . Using eq. (©, condition © is 
replaced by: 

lim h(n) = fi (9) 

n— >oo 

We can replace eq. (j9]) alternatively by the following 
statement: For any e > there exists n € N such that, 
for any n' > n: 

\h{n')-n\<e. (10) 

The main objective of the paper is to find the expected 
distribution p n {si) consistent with eq. (fl"0|). The case 
/.t = would correspond to systems where, although 
growing in size, its complexity (and thus, its statistical 
entropy) is bounded or grows sublinearily with logn, a 
case studied in 51]. Here, we are interested in the in- 
termediate case, where /x £ (0, 1). This characterization 
would depict systems with some balance among ordering 
and disordering forces, and thereby displaying a dissipa- 
tion of statistical entropy proportional to the maximum 
entropy achievable for the system in equilibrium. There- 
fore, we will refer to the problem of finding solutions for 
eq. (j9|) as the entropy restriction problem. A computa- 
tional test for this result can be illustrated by the model 
results shown in fig.(l). The picture shows a spatial snap- 
shot of the local population densities of a model of urban 
growth displaying Zipf's law [23l ] . The normalized en- 
tropy evolves towards a stationary value fi ~ 0.65 consis- 
tently with our discussion. This is true in spite that this 
model exhibits wide fluctuations due to its intermittent 
stochastic dynamics. 




FIG. 1: (Color online) An example of the behavior of the 
normalized entropy for a multiplicative stochastic process ex- 
hibiting Zipf's law. Here we use the model described in [23l ] 
using a 80 x 80 lattice where each node is described by a den- 
sity of population p(i,j). The rules of the model are very 
simple: i) At every time step, each node loses a fraction a 
of its contents, which is distributed among its four nearest 
neighbors, ii) At time t + 1 the local population is multi- 
plied, with probability p, by a factor p . Furthermore, with 
probability 1 — p, the population of a node is set to zero. Ad- 
ditionally, at each step a random number n is added to every 
node. In this way, we avoid falling into an absorbing state 
p = 0. Here we use < 77 < 0.01, a = 1/4 and p = 3/4. This 
is an extremly simplified (and yet successful) model of urban 
population dynamics. A snapshot (for t — 500) is shown in 
(a) where we can appreciate the wide range of local densities, 
following Zipf's law (b). If we plot the evolution of the nor- 
malized entropy fi over time (averaged over 10 2 replicas) we 
observe a convergence towards a stationary value p « 0.65. 



A. Properties of the entropies of a power law 



III. EMERGENCE OF ZIPF'S LAW IN 
STOCHASTIC SYSTEMS 

As pointed out in 36], the main difficulty we face in 
this kind of equations is that we are not dealing with 
an extremal problem, since our value of entropy is pre- 
viously fixed and it is neither minimum nor maximum, 
in Jaynes' sense [sij. Thus, classical variational meth- 
ods, which have been widely used with great success in 
statistical mechanics |60l - [63| . do not apply to our prob- 
lem -although recently it has been shown that variational 
approaches using Fisher information and physically rele- 
vant constraints lead to Power-laws whose exponent can 
be close to 1 [64|. We also must take into account that 
the particular properties of Zipf's law create an addi- 
tional difficulty if the studied systems display, a priori, 
an unbounded number of possible states. Specifically, we 
refer to the non-existence of finite moments and normal- 
ization constant in the thermodynamical limit. However, 
as we shall see, these apparently undesirable properties 
will be the key to our derivation. 



Let us briefly summarize the properties of the entropies 
of power-law distributed systems, which will be used to 
derive the main results of this work (For details, see Ap- 
pendix A) . Such properties are intimately linked with the 
behavior of the Riemann Zeta function, £(7) [65[: 

c(7) = E^- (ii) 

k=l 

In the real line, this function is defined in the interval 
7 G (1, 00), displaying a singularity for 7 — > 1 + . 

Now, let us suppose that the system contains n states 
and the probability to find the i-th most likely states 
decay as a power-law, i.e., p n {si) oc i f . For the sake 
of simplicity, we will refer to its associated entropy as 
H (n, 7) and to its normalized counterpart as h(n, 7), i.e.: 

(12) 

The most basic properties concern the global behavior 
of H(n, 7). It is straightforward to check that H(n,-f) 



is i) a monotonous increasing function on n t and ii) a 
monotonous decreasing function on 7. Moreover, the nor- 
malized entropy of Zipf 's law of a system with n states 
converges to 1/2 [66[, i.e., 

lim h{n, 1) = -. (13) 

n— )-oo 2 

We also note that the entropy of a power law with ex- 
ponent higher than one is bounded i.e., if 7 > 1 is the 
exponent of our power law, there exists a finite constant 
0(7) such that: 

lim ff(n,7) < 0(7). (14) 

71— >00 

A key consequence of this result is that, if our (unknown) 
probability distribution is dominated J67J from some k by 
some power-law with exponent 7 > 1 + 5 (for any 5 > 0), 
our entropy will be bounded. 

Furthermore, it can be shown that the normalized en- 
tropy of a power-law distribution in a system with n dif- 
ferent states, with exponent 7 < 1, converges to 1, i.e., 

lim h(n,j) = 1. (15) 

n— ► oo 

Consistently, we can conclude that, if an (unknown) 
probability distribution is not dominated from any m by 
a power law with exponent lower than 1 — 8 (for any 
5 > 0), the normalized entropy of our system will con- 
verge to 1. 

Using these properties, in the following sections we 
proceed to derive Zipf's using two complementary ap- 
proaches, namely 1) proposing a power-law as the as- 
symptotic solution of eq. © -section IIIB- and 2) As- 
suming that the entropy behaves in a scale-invariant way 
-section IIIC. 



B. Power Law Ansatz: Convergence of Exponents 

to 7 = 1 



4 




FIG. 2: Normalized entropies of five power-law distributed 
systems of different size as functions of the exponent. The 
curves display 5 different sizes, n — 500000 black circles, 
n — 10000 white circles, n = 10000 up triangles, n = 1000 
squares and n — 100 down triangles, respectively. The most 
interesting feature of the numerical computations is the sharp 
decay of the normalized entropy when the values of the ex- 
ponent are cllose to 1, which implies that a wide range of 
normalized entropies are obtained by tuning the exponent of 
the power-law distribution around unity. Furthermore, we 
observe that the decay is sharper as the size of the system 
grows, concentrating an increasing range of relative entropies 
near the exponent 1 (grey area). 



i.e., the sequence of normalized entropies %, associated 
to system's growth, namely 

W = /i(l>7(l))»fc(2,7(2)),...,ft(*,7(*)),- s (18) 

converges to /1. Below we split the problem in two differ- 
ent scenarios. 



In this section we make use of the power-law ansatz as a 
solution of our problem, i.e., we assume that the solution 
is a power- law with an arbitrary exponent, i.e., p n {si) ot 
i -7 . The objective of this section is to demonstrate that, 
being h(n,j) as defined in eq. (H"2"1) . then the following 
limit holds: 

lim /i(n,7) = 9(7), (16) 

n— >oo 

being 8(7) the step function, i.e., 6(7) = 1 if 7 < 1 and 
6(7) = if 7 > 1. It implies that, for large values of n, 
the whole range of normalized entropies between and 
1 is obtained from exponents 7 arbitrarily close to 7 = 1 
-see fig ©. 

Let us rewrite the convergence assumptions provided 
in (|9I10I) assuming that our probability distribution is a 
power law: For any e > we can find an n such that, for 
any n! > n we have an exponent, 7(71') such that, 



1. First case: jj, < i. 

We begin by exploring the following scenario: 

Km h(n > j(n))= f iG (o , i V (19) 

From equation fjl3|) we can ensure that, for large values 
of n, 7(77.) > 1. Since we assumed that the sequence % 
converges to /x, we can state that, for a given e > 0, there 
is an arbitrary n\ such that: 

[X- e < h(m, 7(711)) < (J. + e. (20) 

We know, from the properties of the entropies of power- 
law distributed systems, that H(m, 7(711)) < 0(7(711)), 
where 0(7(711)) is some positive, finite constant (see eq. 
(|14|) and appendix). Then, since logx is an unbounded, 
increasing function of x, we can find 71 2 > Tti such that 



|/i(ra',7(n'))-At| < e, 



(17) 



0( 7 (m)) < (/i + e) log n 2 . (21) 
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Thus, since h(n, 7) is a decreasing function on 7, we need 
to find 7(77,2) < 7(^1) such that 



fi - e' < h(n 2 ,~f(n 2 )) < ju + e\ 



(22) 



with e' < e, in order to satisfy the entropy restriction. 
Furthermore, since H(n, 1) — \ log 71 + O (log (log nj), we 
conclude that 1 < 7(77,2) < 7( n i)- Let us expand this 
process recursively, thus generating an infinite decreasing 
sequence of exponents, 



{7("t)}fci =7(m),-,7K),- 



(23) 



such that, for any 7(72,) £ {j{ n k)}'kLi, 7(^1) > 1- We 
notice that, for any a > 0, we can find a nt such that, if 
7ij > n k , 



\j(nj) - 1| < a, 



(24) 



since, for every 7(77.^), we always find a rij > such 
that 



0(7("fc)) < (M + e ) log rij 



(25) 



C. Scale invariance Condition 

The above power-law ansatz is purely mathematical, 
and can be replaced by a more physically realistic as- 
sumption. This leads us to the second strategy to solve 
our problem, which is based on the assumption that the 
mechanisms responsible for the growth and stabilization 
of the system do not depend on the size of the config- 
uration space, and, thus, a partial observation of the 
system will satisfy also condition ([9]). We will refer to 
this assumption as the scale invariance condition, and it 
is formulated as follows. Let E^* 1 C E be the set of the 
first k elements of E, observing a labeling consistent with 
the ordering of probabilities provided in eq. ([8]) -roughly 
speaking, the k most probable elements of E. The ran- 
dom variable which accounts for the observations of such 
k elements is notated X{k < n). We observe that, if 
X (n) follows the probability distribution p n , the random 
variable X(k < n) obeys the following probability distri- 
bution, to be notated p 1 ^- 



2. Second case: /1 > | . 

Let us now consider the following entropy restriction 
problem: 



li.ll) //(//. " ; ill)) jl ( — , 1 



(26) 



From equation (|13p . we can ensure that, for any n, 
7(77) < 1. Furthermore, from equation (|15p . we again 
find a problem close to the one solved above, since for n\ 
large enough and 7 < 1, we have: 

H(m + 1,7) - H(m,j) > /i(log(m + 1) - log m). (27) 

Now, since we assumed that the sequence H converges, 
we can state that given an arbitrary step ri\ , 



/i — e < h(ni,j(ni)) < /j + e. 



(28) 



Since H (n, 7) is a decreasing function on 7, we need to 
find 7(7x2) > 7(^1) such that: 



li - e' < h(n 2l l{n 2 )) < /i + e', 



(29) 



with e' < e, to satisfy the entropy restriction. However, 
from eq. (fT3")) , we know that 1 > 7(77-2) > 7(771). Pro- 
ceeding as above, we expand this process, thus generating 
an infinite increasing sequence of exponents {7(77fc}^ 1 . 
By virtue of equation (fT3| and equation (fTS"]) . and taking 
into account the decreasing behavior of h as a function 
of the exponent, we observe that, for any a > 0, we can 
find a such that, if nj > nk, 



\l( n j) -1| < a. 



(30) 



In summary, under the power law ansatz, the only so- 
lution for eq. ([H]), in the limit of large systems, is 7 = 1, 
i.e., Zipf's law. 



p k n (i)=F( Si \i<k)= (5>»( Sj -) 
ij<k 



Pn{S l 



(31) 



Thus, if H(X(k < n)) is the entropy of X(k < n), its 
normalized counterpart is defined as h(k < n): 



logfc 



(32) 



We remark that these derivations are valid at the limit of 
large systems, thereby considering that, at every step, n 
is arbitrarily greater than k. Furthermore, let us define 



e as: 



\h(k < n) - /i| + 5, 



(33) 



being S arbitrarily small. Then, the scale invariance as- 
sumption for the entropy states that, for any n > k' > fc, 



\h(k' <n)-n\<e' 



(34) 



In summary, condition (|34p . is grounded on the assump- 
tion that the entropy restriction works at all levels of 
observation. Thus, the partial probability distributions 
of states we obtain must reflect the effect of the entropy 
restriction, introducing a scale invariance of the normal- 
ized entropy of the partial samples of the system. 

As we saw in the above sections, the decay of this tail 
is strongly constrained by the entropy restriction, since 
only special cases avoid the normalized entropy to fall to 
or 1 . To study in detail how it constrains the tail of the 
distribution we will work with the coefficients /„(/c, fc+1), 
defined as: 



f n (i,i + l) 



Pn(Sj) 
Pn(Si+l) ' 



instead of the raw probability distribution, to avoid mul- 
tiplying factors due to normalization. Now we observe 
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that, for a given, very large n, our probability distri- 
butions p\ must be able, as k increases, to unbound- 
edly increase the entropy of the whole system to reach 
the global value H(X(n)), which lies in the interval 
((fx — e) logn, (fi + e) logn). Furthermore, scale invari- 
ance condition depicted in eq. (p4|) forces that, as k 
increases, contributions to the entropy never go neither 
to nor to log(fc + 1/logfc), but lie within this interval. 
In other words, the sum defined by the entropies must 
diverge as k increases over a system where n is arbitrary 
large, whereas the sequence of its normalized versions 
must converge to \i. The above derivations concerning 
the convergence properties of the entropy -see also Ap- 
pendix A- clearly state that those properties hold if p n 
satisfies, on one hand, for large i's, 



f n (i,i + l) < 



1 



(1-5) 



(35) 



to avoid that h(n) —> 1. On the other hand, if we want to 
avoid that h(n) — > 0, the following inequality must hold: 



f n (i,i + l) > 



i + 1 



(36) 



Therefore, the solution of our problem lies in the range 
defined by: 



i + l\ (1 - S) A + 

^ ' >f n (i,i + l)> (i±±) . (37) 



From the study of the entropies of a power law performed 
in the previous section, we know that 5 can be arbitrarily 
small if the size of the system is large enough. Thus: 

pi- ■ , Vn(Si) i + 1 fool 

f n (l,l + l) = — « — 38) 

p n (s i+ i) I 

which leads us to Zipf's law as the unique asymptotic 
solution: 



Pn(Si) OC i 1 . 



(39) 



In this paper we treat complex systems as stochas- 
tic systems describable in terms of algorithmic complex- 
ity and thus statistical entropy. A general result from 
the algorithmic complexity theory is that eq. (3) holds 
for stochastic systems. Taking this general result as the 
starting point, we define a characterization of a wide 
class of complex systems which grasps the open nature 
of many complex systems, summarized in eq. The 
main achievement of this equation is that it encodes the 
concepts of growing and, even most important, the stabi- 
lization of complexity properties in an intermediate point 
between order and disorder, a feature observed in many 
systems displaying Zipf 's-likc statistics. From this equa- 
tion we derived Zipf's law as the natural outcome of sys- 
tems belonging to this class of stochastic systems. 

Our development avoids the classical procedures based 
on maximization (minimization) of some functional in 
order to find the most probable configuration of states, 
since in far from equilibrium the ensemble formalism, to- 
gether with Jaynes' maximum entropy principle [6(| can 
fail due to the open, non-reversible behavior of the sys- 
tems considered here. Thus we do not introduce mo- 
ment constraints, as it is usual in equilibrium statisti- 
cal mechanics [631 ] . but instead a constraint on the value 
achieved by the normalized entropy, no matter the scale 
we observe the system. Both a scaling ansatz and a more 
general scale invariance assumption lead to Zipf's law as 
the unique solution for this problem. We observe that the 
finite size effects define an interval of exponents around 
1, namely (1 — 5,1 + 5), which could partly explain the 
variation observed in finite, natural systems. However, 
it is true that a system satisfying eq. ([9]) does not nec- 
essarily exhibit Zipf's law. Further work should explore 
in depth the physically relevant conditions leading the 
evolution of Zipf's like systems to remove the mathemat- 
ical assumptions made in this paper, thereby obtaining a 
complete description of them from a completely general, 
theoretical viewpoint. 



Appendix A: Entropic Properties of Power-Law 
distributed systems 



IV. DISCUSSION 

Complex, far from equilibrium systems involve a ten- 
sion between amplifying mechanisms and negative feed- 
backs able to buffer the impact of fluctuations. In this 
paper we have considered the consequences of such ten- 
sion in terms of one of its most well known outcomes: the 
presence of an inverse scaling law connecting the size of 
observed events and its rank. The commonality of Zipf's 
law in both natural and man-made systems has been a 
puzzle that attracted for years the attention of scien- 
tists, sociologists and economists alike. The fact that 
such a plethora of apparently unrelated systems display 
the same statistical pattern points towards some funda- 
mental, unifying principle. 



Consider a system whose behavior is described by the 
random variable X(n) taking values on the set £ = 
{s\, s n }, |S| = n, according to the probabilty dis- 
tribution Pn(si). The labeling Y of the state is chosen in 
such a way that p n (s\) > p n (s2) > ■•■ > Pn(s%) > ■•■ > 

Pn(Sn)- 

The Shannon entropy of our system of n states, to be 
noted H(X(n)), is defined as [58j |: 



H(X(n)) = 



k<n 



Pn(sk)\0gp n (s k ) 



(Al) 



The normalized entropy of the system, to be written, 
h(n), is defined as: 

H(X(n)) 



h(n) 



log n 



(A2) 
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We will work with power-law distributions, by which 
Pn( s i) — 'g'* 7 where n is the number of available states, 
and Z the normalization constant, which depends on 
the size of the system, n. Let us rewrite the function 
H(X(n)) as a function of the exponent and the size of E, 
H(n,j). Consistently, 



ft(n,7) 



g(»,7) 
logn 



(A3) 



This appendix is devoted to derive five properties of 
the entropy of power-law distributed systems. 

1. H(n, 7) is a continuous, monotonous decreasing 
function with respect to 7 in the range (0, 00). 

Indeed, the dominant term of its derivative is: 



dH(n, 7) 



i<n 



< 0. 



(A4) 



2. The entropy of a power-law is a monotonous, in- 
creasing function on the size of the system^^ . 

We want to show that H (n, 7) is a monotonous increas- 
ing function on n. In order to prove it, we must compute 
the difference H(n+ 1,7) — H(n,j). For simplicity, let 
us define: 



k<n 



(A5) 



Using the trivial inequality: 



log S 



(1 + n)" 



> log(5„), 



(A6) 



we can state that: 



if(n + l,7) -H(n,j) 



S>, 



T- E 



(1+n) 7 fc<rs+l 



logfc 



log S 



> 7 £^( L_ J_ N 

k<n kl \ S n + J^+Tp Sn y 



(l + n)T 
log(n + 1) 



fc<r 



(n+Tpi 



52(n + l)7 + ,S„ 



S„(n + l)Tlog(n+l)-]T 



fc<r 



log fc 



> 0. 



Finally, it is easy to check that the following properties 
also hold: 



lim H(n,j) = 0, 

7— >oo 

lim H(n, 7) = logn. 
7— »o 



(A7) 
(A8) 



3. The normalized entropy of Zipf's law of a system 
with n states (p n {si) oc i^ 1 ) converges to 1/2: 
We want to show that the sequence 

H = {h(k, 1)}£° =1 - h(l, l),h(2, 1), h(k, 1), ... (A9) 

converges to \. Let us suppose that % is a sequence 
satisfying the above requirements. Then, the entropy for 
a given n can be approached by [66]: 



H(n,l) = -logn + 0(log(logn)). 



(A10) 

Thus, if h(n, 1) = H(n, l)/logn, let us define e(n) like: 

0(log(logn)) 



e(n) 



h(n,l) 



log n 



(All) 



Clearly, e(n) is strictly decreasing on n, and, furthermore, 
lim e(n) = 0. (A12) 



4. The Entropy of a power law with exponent higher 
than 1 is bounded. 

Here we demonstrate that the entropy of a power law 
with exponent higher than 1 is bounded[69|. Specifically, 
we assume there exists a pair of positive constants Z, S, 
such that: 



Pn(i) 



1 

— 1 

Z 



-(1+5) 



(A13) 



Then, the sequence of H = {h(k, 1 + S)}^° =1 converges to 
0. Indeed, let us first note that: 



lim p n (s l 



1 



C(i + <*) 



-(1+6) 



where 



00 



k i+s 



(A14) 



(A15) 



is the Riemann zeta- function 65] . The function is defined 
by an infinite sum which converges, in the real line, if 
S > 0, i.e.: 



00 ^ 



k l+S 



< OO. 



(A16) 
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otherwise, the sum diverges. Furthermore, it is also true 
that the above condition also holds for the following se- 
ries: 



E 



log k 



(A17) 



Indeed, note that, given an arbitrary 5 > there exists 
a finite number i* such that: 



. log (log i) . 
I ' = mm < i : \ 5 , ) > 

log i 



and, if we define the following exponent, (3(i*): 

log(logi*) 



(3(i*) = 1 + 5- 



logi* 



(A18) 



(A19) 



there exists a finite constant, *f?(8), defined as: 

*(*) = £G3i-iVw*))' (A20) 



such that: 



(A21) 



With the above properties, it is clear that, if there 
exists a constant 0(1 + 5) < oo such that: 



lim H(n,l + 5) < 0(1 + 5), 



(A22) 



then, the entropy of a power law with exponent higher 
than 1 is bounded. As we shall see, it is straightforward 
by checking directly the behavior of H(n, 1 + 6): 

A- H ^ 1 + ^ = WTs) E ^ + lo s(Cd + *))■ 

Since H(n, 7) is an increasing function on n, and 

tfTTS) E ^ + l0 §(C(l + S)) < oo, (A23) 

we can define a constant 0(1 + 5), 

0(1 + 5) = lim H(n, 1 + 6) + e (A24) 

n— >oo 

(where e is any positive, finite constant). Clearly, 

ff(n,l + 5) <0(l + 5). (A25) 

Thus, 

H(n,l + S) 



lim /i(n, 1 + 5) = lim 
< lim 



0. 



logn 

<l+j) 
logn 



Consequence // aw unknown probability distribution 
is dominated from some k by some power-law with expo- 
nent higher than 1 + 5, our entropy will be bounded. 

Consequence // an unknown probability distribution 
is dominated from some k by some power-law with expo- 
nent higher than 1 + 8, our normalized entropy will tend 
to 0. 

5. The normalized entropy of a power-law distribu- 
tion in a system with n different states, p n with exponent 
lower than 1 converges to 1. 

Let us suppose that we have the following probability 
distribution, with < S < 1: 



Pn(Si) 



1 



-(1-5) 



(A26) 



Note that 661: 



\ " 



k i-s 

k<n 



-1-5 



(A27) 



Applying directly the definition of entropy, 
8(1-8) ^ logfc 



H(n,l-8) 



£ ^ +51ogn-log5. (A28) 



I s ^ k 

k<n 



If we compute the limit of h(n, 1 — 8): 



lim h(n, 1 — 5) 



>oo 1 log n ■ n° ^— ' k 



r l-*/, 1't 

= lim log n - - + 

n-xxi logn \ , 

= 1-5+5 

= 1. 



Consequence If our (unknown) probability distribu- 
tion is not dominated from some k by a power law with 
exponent higher than 1 — 5, our normalized entropy will 
converge to 1. 
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by a power law with exponent 1 + 8 if (3m) : (Vi > 




[68] The reader could object that this section is unnecessary, 
since the axiomatic derivation of the uncertainty func- 
tion (which we call entropy) assumes that the entropy 
increases with the size of the system. However, the ex- 
plicit statement of this axiom corresponds to the spe- 
cial case of uniform probabilities [53]. Specifically, the 
axiom states that, if we have two systems A, B such 
that A contains n states oi, a n and B contains n + 1 



states, bi, ...,b n +i, then, if (Vi < n)p(a») = 1/n and 
(Vi < n + l)p(h) = l/(n + 1) H(A) < #(B).Thus, if 
we are not dealing with this special case, we need to ex- 
plicitly demonstrate that it holds for our purposes. 
[69] This derivation is equivalent to the one found in [3(|, 
Theorem 8.2. In this theorem, the authors demonstrate 
that every infinite distribution with infinite entropy is hy- 
perbolic, which implies that the distribution is not dom- 
inated by a power law with an exponent higher than 1. 



